Comparative Study of Various Genomic Data Sets for Protein Function Prediction and Enhancements Using Association Analysis
نویسندگان
چکیده
The prediction of protein function is a key task in bioinformatics and a variety of techniques and data sets have been employed for that purpose. Using the popular keyword recovery measure, which is based on standard keyword annotations of the SwissProt database, this paper presents a comparative study of the information provided for protein function prediction by different types of data sets: phylogenetic profiles, protein interaction networks, and gene expression data. The technique employed is to evaluate the average keyword recovery achieved when the top (most strongly connected or similar) pairs of proteins are taken from each data set. The results show that protein interaction data contains the most information, then gene expression data, and finally, phylogenetic profiles. In addition, the average keyword recovery is also computed for the top pairs derived from the raw protein interaction data using a measure, h-confidence, which comes from the data mining area of association analysis. This approach gives improved results over raw protein interaction data and even better results when applied to protein complexes that were computationally generated using the raw protein complex data. The paper also briefly discusses the fact that the different data types appear to be
منابع مشابه
مقایسه روش های مختلف آماری در انتخاب ژنومی گاوهای هلشتاین
Genomic selection combines statistical methods with genomic data to predict genetic values for complex traits. The accuracy of prediction of genetic values in selected population has a great effect on the success of this selection method. Accuracy of genomic prediction is highly dependent on the statistical model used to estimate marker effects in reference population. Various factors such a...
متن کاملComparing Different Marker Densities and Various Reference Populations Using Pedigree-Marker Best Linear Unbiased Prediction (BLUP) Model
In order to have successful application of genomic selection, reference population and marker density should be chosen properly. This study purpose was to investigate the accuracy of genomic estimated breeding values in terms of low (5K), intermediate (50K) and high (777K) densities in the simulated populations, when different scenarios were applied about the reference populations selecting. Af...
متن کاملComparative genomics for reliable protein-function prediction from genomic data.
Genomic data provide invaluable, yet unreliable information about protein function. However, if the overlap in information among various genomic datasets is taken into account, one observes an increase in the reliability of the protein-function predictions that can be made. Recently published approaches achieved this either by comparing the same type of data from multiple species (horizontal co...
متن کاملImputation of parent-offspring trios and their effect on accuracy of genomic prediction using Bayesian method
The objective of this study was to evaluate the imputation accuracy of parent-offspring trios under different scenarios. By using simulated datasets, the performance Bayesian LASSO in genomic prediction was also examined. The genome consisted of 5 chromosomes and each chromosome was set as 1 Morgan length. The number of SNPs per chromosome was 10000. One hundred QTLs were randomly distributed a...
متن کاملO-3: Drug Repositioning by Merging Gene Expression Data Analysis and Cheminformatics Target Prediction Approaches
The transcriptional responses of drug treatments combined with a protein target prediction algorithm was utilised to associate compounds to biological genomic space. This enabled us to predict efficacy of compounds in cMap and LINCS against 181 databases of diseases extracted from GEO. 18/30 of top drugs predicted for leukemia (e.g. Leflunomide and Etoposide) and breast cancer (e.g. Tamoxifen a...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2007